Goto

Collaborating Authors

 ad dropfortoken leveltask


AD DROPforToken LevelTasks

Neural Information Processing Systems

Fortoken-leveltasks (e.g., NER and text generation), as we have several logit outputs to produce the corresponding attribution matrices for each attention map, applyingAD-DROPhas the challenge ofhowtofuse theseattributionmatrices. The results on the test sets are reported in Table 1 and Table 2. Moreover, to verify thatAD-DROPcan be adapted to other pre-trained models, for CoNLL-2003 NER, we chooseELECTRAasthebasemodel.ForWMT2016,OPUS-MTischosen. We discuss potential limitations ofAD-DROP as follows.